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Abstract. In this paper, we present a factor 16 approximation algorithm for the following 
NP-hard distance fitting problem: given a finite set X and a distance d on X, find a 
Robinsonian distance da on X minimizing the Zoo-error ||d — dflHoo ~ maxx ^y^x{\d{x,y) — 
dR{x, y)\}. A distance d_R on a finite set X is Robinsonian if its matrix can be symmetrically 
permuted so that its elements do not decrease when moving away from the main diagonal 
along any row or column. Robinsonian distances generalize ultrametrics, line distances 
and occur in the seriation problems and in classification. 



1. Introduction 

1.1. Seriation problem. Many applied algorithmic problems involve ordering of a set of 
objects so that closely coupled objects are placed near each other. These problems occur in 
such diverse applications as data analysis, archeological dating, numerical ecology, matrix 
visualization methods, DNA sequencing, overlapping clustering, graph linear arrangement, 
and sparse matrix envelope reduction. For example, a major issue in classification and 
data analysis is to visualize simple geometrical and relational structures between objects. 
Necessary for such an analysis is a dissimilarity on a set of objects, which is measured directly 
or computed from a data matrix. The classical seriation problem [16\ [T8] consists in finding 
of a simultaneous permutation of the rows and the columns of the dissimilarity matrix 
with the objective of revealing an underlying one-dimensional structure. The basic idea is 
that small values should be concentrated around the main diagonal as closely as possible, 
whereas large values should fall as far from it as possible. This goal is best achieved by 
considering the so-called Robinson property [20]: a dissimilarity matrix has this property 
if its values do not decrease when moving away from the main diagonal along any row or 
column. Experimental data usually contain errors, whence the dissimilarity can be measured 
only approximatively. As a consequence, any simultaneous permutation of the rows and 
the columns of the dissimilarity matrix gives a matrix which fails to satisfy the Robinson 
property, and we are led to the problem of finding a matrix reordering which is as close as 
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possible to a Robinson matrix. As an error measure one can use the /^-distance between 
two matrices. Several heuristics for seriation using Robinson matrices have been considered 
in the literature (the package seriation [H] contains their implementation). However, these 
methods either have exponential complexity or do not provide any optimality guarantee of 
the obtained solutions. In this paper, we provide a factor 16 algorithm for the NP-hard 
problem of optimally fitting a dissimilarity matrix by a Robinson matrix under the ^oo-error. 

1.2. Definitions and the problem. Let X be a set of n elements to sequence, endowed 
with a dissimilarity function d : X'^ — > R"*" U {0} (i.e., d{x, y) = d{y, x) > and d{x, x) = 0). 
A dissimilarity d and a total order -< on X are compatible if d{x,y) > d{u,v) for any 
four elements such that x ~< u ~< v ~< y. Then d is Robinsonian if it admits a compatible 
order. Basic examples of Robinson dissimilarities are the ultrametrics and the standard 
line-distance between n points on the line. Denote by T> and TZ the sets of all dissimilarities 
and of all Robinson dissimilarities on X. For d,d' G V, define the /oo-error by — d'Hoo = 
maxx ^yi=x{\d{x, y) — d'{x, y)\}. To formulate the corresponding fitting problem, we relax the 
notions of compatible order and Robinson dissimilarity. Given e > 0, a total order ~< on 
X is called e-compatible iix~<u~<v~<y implies d{x,y) + 2e > d{u,v). An e-Robinsonian 
dissimilarity is a dissimilarity admitting an e-compatible order, i.e., for each pair x,y € X 
one can pick a value dji{x, y) G [d{x, y) — e, d{x, y) + e] so that the resulting dissimilarity dn 
is Robinsonian. In this paper, we study the following NP-hard [8] optimization problem: 

Problem ^oo-FITTING-BY-ROBINSON: Given deV, find a Robinson dissimilarity dR G 
TZ minimizing the loo-error \ \d — dji\\oo, i.e., find a least e such that d is e- Robinsonian. 

1.3. Related work. Fitting general distances by simpler distances (alias low-distortion 
embeddings) is a classical problem in mathematics, data analysis, phylogeny, and, more 
recently, in computer science. We review here only the results about /oo-fitting of distances 
(this error measure is also known as the maximum additive distortion or the maximum 
additive two-sided error |5j). Farach et al. [13] showed that /oo-fitting of a distance d by 
an ultrametric is polynomial. This result has been used by Agarwala et al. [1] to design 
a factor 3 approximation algorithm for Zoo-htting of distances by tree-distances, a problem 
which has been shown to be strongly NP-hard [1]. A unified and simplified treatment of 
these results of [H [13] using sub-dominants was given in [?]• A factor 2 approximation 
algorithm for the NP-hard problem of /oo-fitting of a dissimilarity by a line-distance was 
given by Hstad et al. [15]. Badoiu [1] proposed a constant-factor algorithm for Zoo-fitting 
of distances by Zi-distances in the plane. 

Seriation is important in archeological dating, clustering hypertext orderings, numerical 
ecology, sparse matrix ordering, matrix visualization methods, and DNA sequencing [3l [6l 
dSl dHl dU [201 • A package seriation implementing various seriation methods is described in 
|14j . The most common methods for clustering provide a visual display of data in the form 
of dendrograms. Dissimilarities in perfect agreement with dendrograms (i.e., ultrametrics) 
are Robinsonian. Generalizing this correspondence, [IH [T2] establish that the Robinson 
dissimilarities can be visualized by hierarchical structures called pyramids. 

1.4. Our result and techniques. The main result of the paper is a factor 16 approxima- 
tion algorithm for the problem /oo-FITTING-BY- ROBINSON. The basic setting of our 
algorithm goes as follows. First we show that the optimal error e* belongs to a well-defined 
list A of size O(n^). As in some other minmax problems, our approximation algorithm tests 
the entries of A, using a parameter e, which is the "guess" for e*. For current e G A, the 
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algorithm either finds that no e-compatible order exist, in which case the input dissimilarity 
d is not e-Robinsonian, or it returns a 16e-compatible order. Now, if e is the least value 
for which the algorithm does not return the negative answer, then e* > e, and the returned 
16e-Robinsonian dissimilarity has /oo-error at most 16e*, establishing that we have a factor 
16 approximation algorithm. 

For e € A, a canonical binary relation ^ is computed so that any e-compatible total 
order refines ^ or its dual. If ^ is not a partial order, then the algorithm halts and returns 
the negative answer. If ^ is a total order, then we are done. Otherwise, we select a 
maximal chain P = (ai,a2, . . . ,ap) of the partial order ^ and search to fit each element 
of X° := X \ P between two consecutive elements of P. We say that Oj,aj+i G P form 
a hole Hi and that all elements x S X° assigned between aj and aj-)_i are located in Hi. 
This distribution of the elements to holes is performed so that (a) all elements Xi of X° 
located in the same hole Hi must "fit" in this hole, i.e., for all x,y Xi one of the orders 
ai -< X < y -< aj+i ov Oi -< y < x ^ a.j+i must be ce-compatible for some c < 12. Partitioning 
X° into sets Xi, i = 1, ... ,p — 1, is not obvious. Even if such a partition is available, we 
cannot directly apply a recursive call to each Xi, because (b) the elements located outside 
the hole Hi will impose a certain order on the elements of Xi and, since we tolerate some 
errors, (c) we cannot ensure that Xi is exactly the set of elements which must be located in 
Hi in some e-compatible total order. To deal with (a), we give a classification of admissible 
and pairwise admissible holes for elements of X°. This allows to show that, if we tolerate a 
12e-error, then each element x S X° can be located in the leftmost or rightmost admissible 
hole for x (we call them bounding holes of x). Both locations are feasible unless several 
elements have the same pair of bounding holes. For i < j, let Xij be the set of all elements 
of X° having Hi and Hj^i as bounding holes. To deal with (b) and (c), on each set Xij 
we define a directed graph CjJ . The strongly connected components (which we call cells) of 

have the property that in any e-compatible order all elements of the same component 
must be located in the same hole. In fact the cells (and not the sets Xi) are the units to 
which we apply the recursive calls in the algorithm. To decide in which hole Hi or Hj-i 
to locate each cell of CjJ and to define the relative order between the cells assigned to the 
same hole, we define another directed graph Qij whose vertices are the cells of in such 
a way that (i) if some Qij does not admit a partition into two acyclic subgraphs then no 
e-compatible order exist and (ii) if Gij has a partition into two acyclic subgraphs Q^j and 
G^j, then all cells of Q^j will be located in Hi, all cells of will be located in -ffj-i, and 
the topological ordering of each of these graphs defines the relative order between the cells. 
To partition Qij into two acyclic subgraphs (this problem in general is NP-complete |17j). 
we investigate the specific properties of graphs in question, allowing us to define a 2-SAT 
formula $jj which is satisfiable if and only if the required bipartition of Qij exists. Finally, 
to locate in each hole Hi the cells coming from different subgraphs Qf,-,Q~,, and Q~-„ with 
j' < i < j < j" , we use the following separation rule: the cells of Q^,^ are located to the left 
of the cells of Q^j and the cells of Q~j are located to the right of the cells of Q~j»- Due to 
space constraints, all missing proofs are given in the full version [9]. 

2. Preliminary results 

The -^-restricted problem is obtained from /oo-FITTING-BY-ROBINSON by fixing 
the total order -< on X. Let d_< be a dissimilarity defined by setting d^{x, y) = inax{d{u, v) : 
X -< u w -< y} for all x,y € X with x < y (we suppose here that a -< a for any a G X). 
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Let 2e_< = ||d — d^Hoo and let be the (Robinsonian) dissimilarity obtained from by 
setting d^{x,y) = max{d^{x,y) — e^,0} for all x,y £ X,x ^ y. Then, the following holds: 

Proposition 2.1. For a total order -< on X and d £T>, d^ minimizes \ \d — loo- 
Proposition [JTl] establishes that an optimal solution of the problem /qo-FITTING-BY- 
ROBINSON can be selected among n! Robinsonian dissimilarities of the form d^. In the full 
version, we show that the natural heuristic similar to the factor 3 approximation algorithms 
of Hastad et al. [15] and Agarwala et al. [l] (which instead of n! total orders considers 
only n orders) does not provide a constant-factor approximation algorithm for our problem. 
Proposition O also implies that the optimal error e* in /oo-FITTING-BY-ROBINSON 
belongs to a well-defined list A = {^\d{x,y) — d{x',y')\ : x,y,x',y' € X} of size O(n^). 

Given d T> and e G A, we define a partial order ^ such that every e-compatible total 
order -< refines either ^ or its dual. For this, we set p =^ q for two arbitrary elements 
p,q (z X, and close ^ using the properties of partial orders and the following observation: if 
d{x, y) > max{(i(x, z), d{z, y)} + 2e, then in all e-compatible with d orders z must be located 
between x and y. In this case, if we know that two of the elements x, z, y are in relation ^ 
then we can extend this relation to the whole triplet. For example, if we know that x =^ z, 
then we conclude that also z ^ y and x =^ z. If the resulting ^ is not a partial order, then 
d does not admit an e-compatible total order. So, further let ^ be a partial order. For 
two disjoint subsets A,B of X, set A ^ B if a ^ b for any a € A and b £ B. We write 
xly if neither x ^ y nor y ^ x hold. For two numbers a and f3 we will use the following 
notations (i) a (3 if \a — (3\ < ce, (ii) (3 >c a if (3 > a — ce, and (iii) (3 »c a if (3 > a + ce. 
We continue with basic properties of the canonical partial order =^: If w =4 {vjz}, v?z, 
u^v, u7z, andwlu, then: (i) d{v,w) ^2 d{z,w); (ii) d{v,z) <2 m.in{d{v , w) , d{z , w)} ; (iii) 
d{w,z) «4 {d{u,v), d{u, z)}; (iv) d{w,u) <2 m.in{d{w,v),d{u,v)}. 

3. Pair wise admissible holes 

3.1. Admissible holes. Let P = (ai, 02, . . . , Op-i, flp) be a maximal chain of the partial 
order ^. For notational convenience, we assume that all elements of X° must be located 
between ai and Op [ai and ap can be artificially added); this way, every element of X° must 
be located in a hole. Let Hij be the union of all holes comprised between Oj, aj. For x G X° , 
denote by H[x) the union of all holes Hi such that x?aj or xToj+i. If H[x) = Hij, the holes 
Hi and Hj-i are called bounding holes; see Fig. 1 (note that Oj = maxja/; G P : ^ x} and 
Oj = minjofc G P : x ^ a^} for x G X°). All other holes of H{x) are called inner holes. Since 
X ^ P, H{x) contains at least two holes. The hole H/^ of H{x) is x-admissible, if the total 
order on Pu{x} obtained from =^ by adding the relation =4 ^ ^ Ofc+i is e-compatible with 
d. It can be easily shown that the bounding holes of H{x) must be x-admissible. Denote 
by dx the mean value of inin{d{x, a^) : i < k < j} and max{d{x, a^) : i < k < j}. We call 
Sk = d(afc,afc+i) the size of the hole H^- Then the following holds: 

Lemma 3.1. If an inner hole H^ of H{x) is x-admissible, then dx ~i {d(x, a^), d(x, afc+i)} 
W2 5k- In particular, 5k ~3 dx- More generally, for all k,k' G]i,j[, we have dx ^3 d{ak,ak')- 

3.2. Pairwise admissible holes. A pair {Hk,Hk'} of holes is called {x,y,c)- admissible 
if Hk is x-admissible, Hk' is y-admissible, and the total order on P U {x,y} obtained by 
adding to ^ the relations Ok ^ x ^ Ofc+i and ak' ^ y =4 o-k'+i is ce-compatible. Denote by 
AH{x) the set of all x-admissible holes Hk so that for each y G X° , y ^ x, there exists an y- 
admissible hole Hk' such that {Hk,Hk'} is a (x, y, l)-admissible pair. Further we can assume 
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Bounding Holes 



Figure 1: Bounding holes and the partition of Xij into and X^^j 

that for any x G X° the bounding holes of H{x) = Hij belong to AH{x). Otherwise, if say 
Hi ^ AH{x), then aj+i -< x in any e-compatible total order -< extending thus we can 
augment the canonical partial order =^ by setting Oj+i ^ x and by reducing the segments 
H{x) accordingly. Next we investigate the pairwise admissible locations of x and y in 
function of the mutual geometric location of the segments H{x) and H{y) and of the values 
d{x,y),dx, and dy. We distinguish the following cases: (HI) H{x) = H{y)\ (H2) H{x) and 
H{y) are disjoint; (H3) H{x) and H{y) overlap in at least 2 holes {H{x) o H{y)); (H4) 
H{x) and H{y) overlap in a single hole {H{x) * H{y))\ (H5) H{y) is a proper subinterval 
of H{x) {H{y) (E H{x)). This classification of pairs {x, y} of X° is used in the design of our 
approximation algorithm. Also the proofs of several results employ a case analysis based 
on (H1)-(H5). We continue with the following result. It specifies the constraints on pairs 
of elements, each element of X° can be located in one of its bounding holes. 

Proposition 3.2. For two elements E X° , any location of x in a hounding hole of 
H(x) = Hij and any location ofy in a bounding hole of H{y) = Hiiji is {x,y, 12) -admissible, 
unless H{x) = H{y) and d{x,y) ^3 max{dx,dy} or d{x,y) ^3 maxjda;, d^}, subject to the 
following three constraints: (i) if H[x) (g. H{y), x and y are located in a common bounding 
hole, then x is between y and Oj+i; (ii) if H{x) * H{y), then i < i' implies x ~< y; (iii) 
if H{x) = H{y), X and y are located in the same bounding hole, and dy <C4 dx, then 
y is between x and aj+i. If H{x) = H{y) and d{x,y) ^3 max{dx,dy}, then the only 
{x,y,l)- admissible locations are the two locations of x and y in different bounding holes. 
If H{x) = II{y) and d{x,y) ^3 max{dx, dy}, then any {x,y,l)- admissible location is in 
common x- and y-admissible holes. 



4. Distributing elements to holes 

In this section, we describe how, for each hole Hi, to compute the set Xi of elements of 
X° which will be located in Hi. This set consists of some x such that Hi is a bounding hole 
of H{x). Additionally, each Xi will be partitioned into an ordered list of cells, to which we 
perform recursive calls. Let Xij consist of all x € X° such that H{x) = Hij. The sets Xij 
form a partition of X°. In the next subsections, we will show how to partition each Xij into 
two subsets X^ and X^, so that X^ will be located in Hi and X^j in Hj^i; see Fig. 1. 

4.1. Blocks, cells, and clusters. Two elements x,y £ Xij are called linked {separated) 
if in all (x, y, l)-admissible locations x and y must be placed in the same hole (in distinct 
bounding holes). Two subsets A and B of Xij must be separated if all x € A and y € B 
are separated. Let Sij and Lij be the sets of all pairs x,y € Xij such that d{x, y) ^3 
maxjdx, dy}, resp., d{x, y) <C3 max{dx,dy}. By Proposition [321 all pairs of Sij are separated 
and all pairs of Lij are linked. Since "be linked" is an equivalence relation, all vertices of 
the same connected component (called block) of the graph Cij = {Xij, Lij) are linked. We 
continue by investigating in which cases two blocks of Cij are separated or linked. For 
x,y € Xij, set X ^ y iff (Al) dx ^4 dy or (A2) dx >4 dy and there exists z € Xij such 
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that xz, yz ^ Lij and d{x, z) <Ci6 d{y, z). If x,y, z € Xij satisfy (A2), then it can be shown 
that y and z are strongly separated, i.e., d{y, z) max{dy, dz}. Additionally, we show that 
if X ^ y, then x ^ y in all e-compatible orders -< such that Cj+i -< {x, y} and y ^ x in all 
e-compatible orders -< such that {x,y} -< flj-i. 

On Xij we define a directed graph CjJ : we draw an arc x ^ y iS (LI) x ^ y and 
x,y belong to a common block of Cij or (L2) d{x,y) <C5 maxjd^^^, dj^}. If (L2) is satisfied, 
then xy € Ljj and y ^ x hold. The strongly connected components of Cj* are called cells. 
Every block is a disjoint union of cells. Indeed, if x, y belong to a common cell, let R be 
a directed path of £^ from x to y. Pick any arc u — > u of i?. If it has type (L2), then 
uv G Cij. Otherwise, ii u ^ v has type (LI), then u and v belong to a common block. 
Thus the ends of all arcs of any path between x, y belong to a common block. 

Lemma 4.1. Let x,x',y S Xij. If x,x' belong to a common cell, but {x,x'} and y belong 
to distinct blocks, then there does not exist an e-compatible order such that x ~< y ~< x' . 

Lemma 4.2. For cells C',C", if x,x' G C", y,y' G C", and x ^ y, y' ^ x' , then C and 
C" must be separated. 

Proof Let B',B" be the blocks containing C',C". If B' = B" , as x ^ y and y' ^ x' , 
they are (Ll)-arcs, hence x — > y and y' — > x' . This is impossible since {x,x'} and {y,y'} 
belong to distinct cells. Thus B' ^ B". By Lemma l4.1|, if we locate x,x',y,y' in the same 
bounding hole Hj, either {x,x'} -< {y,y'} or {y,y'} -< {x,x'} holds. On the other hand, 
X ^ y, y' ^ x' imply that x ~< y and y' -< x' . Thus C and C" must be separated. ■ 

Now, let Sij be a graph having cells as vertices and an edge between two cells C',C" 
iff (SI) there exist x,y £ Xij, x in the same block as C and y in the same block as C" 
such that xy G Sij or (S2) there exist x, x' in the same block as C and y, y' in the same 
block as C" such that each pair xx' and yy' belong to a common cell, and x ^ y,y' ^ x'. 
By Proposition 13.21 and Lemma 14.21 in cases (SI) and (S2) the sets C and C" must be 
separated. The graph Sij must be bipartite, otherwise no e-compatible order exist. Now, for 
each connected component of Sij consider its canonical bipartition {A', A"}, and draw an 
edge between any two cells, one from A' and another from A" . Denote the obtained graph 
also by Sij. Call the union of cells from A' (or from A") a cluster. The clusters /C' and 
IC" of A' and A" are called twins. From the construction, we immediately obtain that all 
elements of a cluster are linked and two twin clusters are separated. A connected bipartite 
component {JC' , IC"} of Sij is called a principal component if there exists x G /C' and y G IC" 
such that X and y are strongly separated. 

4.2. Partitioning Xij into X~j and X^. We describe how to partition Xij into the subsets 
Xl[j and X^. For this, we define a directed graph Gij having cells as vertices, and an arc 
C ^ C with tail C and head C exists iff one of the following conditions is satisfied: (Gl) 
C and C belong to twin clusters of Sij; (G2) C and C are not connected by (Gl)-arcs and 
there exist x £ C and x' G C such that d^' <§C4 dx] (G3) C and C are not connected by 
(Gl)- or (G2)-arcs and there exist x G C, x' G C, and z G Xij such that xz,x'z ^ Lij and 
d{x' , z) <Ci6 d{x,z). A head of a (G3)-arc is called a (G3)-ce//. A {Gi)-cycle is a directed 
cycle of Gij with arcs of type (Gi), i = 1,2,3. The (Gl)-cycles are exactly the cycles of 
length 2. A mixed cycle is a directed cycle containing arcs of types (G2) and (G3). Finally, 
an induced cycle is a directed cycle C such that for two cells C,C' £ C we have C" ^ C if 
and only if C is the successor of C in C. Our next goal is to establish that either the set 
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of cells can be partitioned into two subsets such that the subgraphs of Qij induced by these 
subsets do not contain directed cycles or no e-compatible order exist. Deciding if a directed 
graph can be partitioned into two acyclic subgraphs is NP-complete [T7j. In our case, this 
can be done in polynomial time by exploiting the structure of Gij. 

Lemma 4.3. IfC = {Ci, C2, • • • , Ck, Ci) is a directed cycle of Gij, then for any e-compatible 
order, C has a cell located in the hole Hi and a cell located in the hole -f^j-i- 

Proof. The assertion is obvious if C is a (Gl)-cycle. So, suppose that all arcs of C have type 
(G2) or (G3). The definition of cells implies that C contains two consecutive cells, say Ci 
and Ck, which belong to different blocks. Suppose that there exists an e-compatible order 
-< such that no element of u|L]^C/ is located in the hole Hi = [aj,aj+i], i.e., Oj+i -< uf^^Ci. 
In each Q pick two elements xi,yi such that x/ ^ yi+i(modk)- Then xi -< yi+i{modk) for 
all I = 1, . . . , fc. We divide the cells of C into groups: a group consists of all consecutive 
cells of C belonging to one and the same block. The first group starts with Ci, while 
the last group ends with Ck- We assert that if {C/_g, . . . , C/} and {C;+i, . . . , Q+r} are 
two consecutive groups of C, then Ci -< C/+i U • • • U Ci+r (all indices here are modulo 
k). Indeed, pick u € Ci and v e Q+i. Since {xi,u} and {yi+i,v} belong to different 
blocks while each of these pairs belong to a common cell, applying Lemma 14.11 to each 
of the triplets of the quadruplet xi,u,yi^i,v, we infer that in the total order -< none of 
yi+i,v is located between xi and u and none of xi,u is located between yi^i and v. Since 
xi -< yi+i, we conclude that {xi,u} -< {yi+i,v}, yielding Ci -< C/+i. Now, consider the 
cell C/+2- The element y/+2 must be located to the right of therefore to the right of 
Ci- Since Ci^2 and Ci belong to different blocks, we can show that C; -< C/+2 by using 
exactly the same reasoning as for the cells Ci and Q+i. Continuing this way, we obtain the 
required relationship C; ~< Q+i U • • • U Ci^r- This establishes the assertion. Suppose that 
[1, ii], [ii + l,i2], ■ ■ ■ , [ij- + 1, k] are the indices of cells defining the beginning and the end of 
each group. From our assertion we infer that C^ -< Ci^ -< Ci^ ^ . . . ^ Ci^ -< C^, contrary 
that -< is a total order. ■ 

Lemma 4.4. If C ^ C is a ( G3)-arc and C belongs to a principal component, then C 
and C belong to the same cluster. In particular, Gij does not contain (G3)-cycles or no 
e-compatible order exist. Moreover, Gij does not contain (G2)-cycles. 

Proof. Let xy be a strongly separated pair with x G C Since C ^ C" is a (G3)-arc, there 
exist y' £ C and x' £ C such that y' ^ x' is an (A2)-arc. Then there exists z' such that 
x'z' is strongly separated. If xz and x'y' belong to different principal components, then 
there exists a (G2)-arc from C to C or from C to C. In the first case, C and C obey (S2), 
thus we cannot have a (G3)-arc from C to C. Analogously, in the second case, we deduce 
that we have at the same time a (G3)-arc and a (G2)-arc from C to C. This is impossible, 
so C and C belong to a common principal component. Now, if Gij contains a (G3)-cycle, 
then the first assertion implies that all its cells belong to the same cluster, and Lemma 14.31 
yields that no e-compatible order exist. Finally, let C = (Ci, C2, . . . , C^, Ci) be a (G2)- 
cycle. In each Ci, pick Xi,yi so that dx^ <?C4 '^j/i+i(„iotifc) • Since there is no (G2) or (G3) arc 
from Ci+n^odk) to d, we get dy^ <4 yielding 4^ <4 ^4 

Thus (i^.. < da;,+2(modfc) foi' « = 1, • • • ^- Then d^^ < d^.^ < ■ ■ ■ < dx^_^ < d^i for even k and 
dxi < dxg < ■ • • < dxf, < dx2 < dx4 < • • < dx,,_j < dx^ for odd k, a contradiction. ■ 

To complete the bipartition of cells into two acyclic subgraphs of Gij , it remains to deal 
with induced mixed cycles. The following results precise their structure. 
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Lemma 4.5. Any induced mixed cycle C of Qij contains one or two (G2)-arcs, and if C 
contains two such arcs, then they are consecutive. 

Lemma 4.6. Let C ^ C be a ( G3)-arc, C ^ C" be a ( G2)-arc, and suppose that there 
is no (G2)-arc from C to C" . If C,C' do not belong to distinct twin clusters and C,C" do 
not belong to the same cluster, then C and C' must be separated. 

Thus a mixed cycle C contains either one (G2)-arc (C is a 1-cycle) or two consecutive 
(G2)-arcs (C is a 2-cycle), all other arcs of C being (G3)-arcs. By Lemma [4.4| the heads of ah 
(G3)-arcs of C are (G3)-cens of the same cluster /C. Then we say that the cycle C intersects 
the cluster /C. For a (G2)-arc Cq ^ C and a cluster /C, we show how to detect if there exists 
a 1- or 2-cycle C passing via Co ^ C and intersecting /C. We consider the case of 1-cycles. 
Then Cq must be a (G3)-cell of /C. Note that an induced 1-cycle cannot contain cells C such 
that Co ^ C is a (G2) or (G3)-arc. Hence, we can remove all such cells of /C. Analogously, 
we remove all cells C so that C ^ C is an arc. In the subgraph induced by the remaining 
cells of /C we search for a shortest directed path Q = C ^ Ci ^ ■ ■ ■ ^ C^ ^ Co so that 
the first arc C ^ Ci and the last arc C^ ^ Co of this path are (G3)-arcs. This can be done 
in polynomial time by testing all possible choices for Ci and C^ and applying for each pair 
a shortest path finding algorithm in an acyclic graph. If such a path Q does not exist, then 
no required induced cycle C exist. Otherwise, the path Q together with the arc Co ^ C 
define an induced cycle C having exactly one (G2)-arc. Indeed, if Cj ^ Cj is a (G2) or 
(G3)-arc and \i — j\ > 2, since the subgraph induced by /C is acyclic, we must have i < j. 
This contradicts the minimality of the path Q. So, the resulting cycle is indeed induced. 
It remains to note that C does not contain other (G2)-arcs, because by Lemma 14.51 in an 
induced cycle the (G2)-arcs are consecutive. Analogously, we can decide if there exists a 
2-cycle passing via Co ^ C and intersecting /C, and having a second (G2)-arc of the form 
C ^ Cg or Cq ^ Co. Therefore, we have the following result: 

Lemma 4.7. For a (G2)-arc Cq ^ C and a cluster fC, one can decide in polynomial time 
if there exists an induced 1- or 2-cycle C passing via C ^ C and intersecting IC. 

For a cell C, let ^2i(C) be the set of (G2)-arcs Cq ^ C belonging to a 1-cycle intersecting 
a cluster IC not containing C. Let 1^2 (C) be the set of (G2)-arcs Co ^ C belonging to a 
2-cycle C intersecting a cluster /C not containing C and passing via Co ^ C so that the arc 
of C entering Co is a (G3)-arc. In both cases Co belongs to IC: Cq is a head of a (G3)-arc 
of C, and all such heads belong to IC. Finally, let ^.^{C) be the set of (G2)-arcs C ^ Cq 
belonging to a 2-cycle C intersecting a cluster IC, so that C belongs to IC and the arc of 
C entering C has type (G2). Fig. 2 illustrates this classification. For each cell C of Gij 
we introduce a binary variable xc satisfying the following constraints: (Fl) xc = xqh, if 
C, C" belongs to the same cluster; (F2) xc 7^ xc, if C, C" belong to twin clusters; (F3) 
Xc / xcq, if the arc Cq ^ C belongs to ^i{C) U Q2{C); (F4) xc ^ xcg, if the arc C ^ Cq 
belongs to ^^{C). Define a 2-SAT formula by replacing every constraint a = 6 by two 
clauses (a V b) and (a V b) and every constraint a ^ bhy two clauses (a V b) and (a V 6) . 

Proposition 4.8. // the 2-SAT formula admits a satisfying assignment A, then the 
sets = {C : A{xc) = 0} and Xf- = {C : A{xc) = 1} define a partition of Qij into 
two acyclic subgraphs. Conversely, given an e-compatible order on X, the assignment A 
defined by setting A{xc) = if C is located in Hi, A{xc) = 1 if C is located in Hj^i, and 
A{xc') = A(xc") if C and C" are located in a common inner hole, is a true assignment 
for ^ij. In particular, if ^ij is not satisfiable, then no e-compatible order exist. 
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l-cytle: Cq ^ C bt'loiigH tu n,{C) 




2-cydc: C ^ Cq bcloi.g^s tu iyC) 



Figure 2: To the classification of tlie arcs incident to a cell C 

Proof. Let ^ be a true assignment of and the partition X^^ , of be defined as 
above. Denote by G^j and G^j the subgraphs induced by X~j and X^. (Fl) forces every 
cluster to be included in one set. (F2) implies that the twin clusters are separated. Hence 
Gj^j and G^ do not contain (Gl)-cycles: if C and C are the two cells of a (Gl)-cycle, 
then [xc V xc) A {xc V xc) yields A{xc) 7^ A(xc")- By Lemma does not contain 
(G2)-cycles. Since the cells of a (G3)-cycle are contained in the same cluster and each 
cluster induces an acyclic subgraph, G~j and G^j do not contain (G3)-cycles as well. Now, 
let Gij contain a mixed cycle. Then it also contains an induced mixed cycle C. From Lemma 
14.51 we infer that C has either one (G2)-arc Cq ^ C or exactly two consecutive (G2)-arcs 
Co ^ C ^ C" . In the first case, we conclude that Cq ^ C belongs to Vt,i{C), thus (F3) 
yields xq / xc^, contrary to the fact that A{xc) = A{xc>) = 1. Analogously, in the second 
case, we deduce that either xq / xcq and the arc Cq ^ C belongs to 1^2 (C*) or xc = 
and the arc C ^ C" belongs to 03(C), whence xc 7^ xc- Then we obtain a contradiction 
with the assumption that A[xcq) = A{xc) = A{xci') = 1- This shows that the subgraphs 
G^j and Gfj obtained from the true assignment A of are acyclic. 

Conversely, let A be an assignment obtained from an e-compatible order as defined 
in the proposition. We assert that ^ is a true assignment for i.e., it satisfies the 
constraints (F1)-(F4). This is obvious for constraints (Fl) and (F2), because if two 
cells C", C" belong to the same cluster, then they will be located in the same hole and 
we must have A{xci) = A{xcii)- If C and C" belong to distinct twin clusters, then they 
must be separated, therefore the unique e-admissible location of C and C" will be in 
different bounding holes, thus A[xc') 7^ A{xcii)- Now, pick an arc Co ^ C which belongs 
to rJi(C) U VL2{C). If Co ^ C belongs to r2i(C), then there exists a 1-cycle C passing via 
Co ^ C and intersecting a cluster /C. Since all cells of C, except C, are heads of (G3)-arcs, 
they all belong to /C, i.e., they have the same value in the assignment. By Lemma 14.3^ C 
must be separated from Cq (namely C and C must be located in different bounding holes) , 
showing that A{xc) 7^ A{xca)- If Co ^ C belongs to r22(C), then let C be a 2-cycle passing 
via Co ^ C and intersecting the cluster /C not containing C. Additionally, we know that the 
arc C ^ Co of C entering Co is a (G3)-arc, thus Co belongs to K,. Since C cannot belong to 
the twin cluster of IC (this will contradicts that C ^ Co is a (G3)-arc) and since C does not 
belong to /C, from Lemma 14.61 we infer that Co and C are separated, thus A[xc) 7^ A{xcq)- 
Finally, let C ^ Cq belong to 03 (C). Then there exists a 2-cycle C passing via C ^ Co 
and intersecting the cluster IC, such that C belongs to K. and the arc of C entering C has 
type (G2). Since all cells of C except C and Co are heads of (G3)-arcs, they all belong 
to IC. Since C also belongs to this cluster, by Lemma 14.31 Cq must be separated from the 
remaining cells of C, yielding xc 7^ xqq. Hence A satisfies the constraints (F1)-(F4). This 
shows, in particular, that if is not satisfiable, then no e-compatible order exist. ■ 
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H, 




Figure 3: Relative location of the cells of X'^^.^-^y XT.,, and X~j [k < i, j' < j) in Hi 

4.3. Sorting the cells of Xt: and X^. Let Qt'j and be the subgraphs of Qij induced 
by the sets Xt^ and X^ obtained from the true assignment of the 2-SAT formula ^ij. We 
will locate all cells of X~j in the hole Hi and all cells of X^ in the hole Hj^i of Hij. The 
elements from two cells C',C" located in the same hole will not be mixed, i.e., C will be 
placed to the right of C" , or vice versa. To specify the total order among cells, we use that 
Q~j and Q^j are acyclic, therefore each of them admit a topological order. We compute a 
topological order Cj^ -< Cj^ ^ . . . -< Cj^ on the cells of X^- and a dual topological order 
Ciq -< Ci^_-^ -<...-< Cij on the cells of X~j. We locate the cells of X^ in Hj^i and the 
cells of X~^ in Hi according to these orders. The following two results relay the topological 
orders on the cells with the order on the distances between elements from such cells. 

Lemma 4.9. Let C',C" be two cells of X^. If C -< C" in the topological order, then for 
any y € C , z E C" and x S XT., we have dy <4 d^ and d{x, y) <i6 d{x, z). 

Proof. Since C',C" belong to X^-, they are not connected by (Gl)-arcs. Since C -< C" in 
the topological order, there is no arc from C" to C . As C" ^ C is not a (G2)-arc, we 
must have dz ^4 dy. As C" ^ C is not a (G3)-arc, we obtain d{x,y) <i6 d{x,z). ■ 

Lemma 4.10. Let C, C, C" he three distinct cells of the graph Qij. If the algorithm returns 
the total order -< and C ^ C < C" , then for any x & C,y £ C, z £ C" or x,y, z £ C L) C 
and X ^ y < z, we have d{x, z) >i% max{d(x, y), d{y, z)}. 

After fixing the relative position of each cell C of Xij, we make a recursive call to C. 
For this, we update the canonical order ^ in the following way: if C is located in X^, we 
set X y if x ^ y, otherwise, if C is located in Xt~j, we set x y if y ^ x. Since 
and =<;~ are dual, if we apply to them the "closing" rules, we will obtain two dual partial 
orders, denoted also by =^"'" and ^~ . The restriction on C of every e-compatible order -< 
on X is an extension of or since all elements of C will be placed in the same hole, 
either Oj+i ^ C or C ^ Oj. If Oj+i ~< C, then x ^ y for all x,y £ C such that x ^ y. Hence 
^ is a linear extension of Therefore, if the recursive call to a cell C returns the answer 
"not", then no e-compatible total order on X exist. Else, it returns a total order on C, 
which is 16e-compatible by induction hypothesis. Then, the total order between the cells 
of Qij and the total orders on cells are concatenated to give a single total order -< on Xij. 

4.4. Defining the total order on Xi. Recall that Xi is the set of all elements of X° 
located in the hole Hi. According to our algorithm, Xi is the disjoint union of all sets XT'- 

(j > ^ + 1) and {k < i). We just defined a total order between the cells of each of the 

sets XT'. , X^^_^^-^y and applying recursion we defined a total order on the elements of each 
cell. To obtain a total order on the whole set Xi it remains to define a total order between 
the sets XT'- {j > i + 1) and X'^,.^-^^ {k < i). For this, we locate each X^,^^^-^-. {k < i) to the 
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left of each (j > i). Given two sets ^k'{i+i) ^' < ^)) locate X'^^._^-^^ to the 

left of ^^(j+i) if and only if A: < k' , i.e., iff Hk(i+i) (s Hk'[i+i). Analogously, given X^-,X~., 
Uij' > i + 1), we locate X^j, to the right of X^j if and only if j' < j, i.e., iff Hij' d Hij. 
This location is justified by the Proposition 13.21 and is illustrated in Fig. 3. 

5. The algorithm and its performance guarantee 

We have collected all necessary tools to describe the algorithm. It consists of three 
procedures /oo-Fitting_by_Robinson, Refine, and Partition_and_Sort. The main procedure 
/oo-Fitting_by_Robinson constructs the sorted list A of feasible values for the optimal error 
€*. Its entries are considered in a binary search fashion and the algorithm returns the 
smallest value e € A occurring in this search for which the answer "not" is not returned 
(i.e., the least e for which a 16e-compatible total order on X exists). To decide, if, for a 
given e, such an order exists, the procedure Refine(X, ^, e) constructs (and/or updates) 
the canonical partial order =<; and computes a maximal chain P of {X, =<(). For each element 
X £ X° := X\P, Refine computes the set AH{x) of all x-holes which participate in {x, y, 1)- 
admissible locations for all y G X° and defines the segment H[x). For each pair i < j — 1, 
Refine constructs the set Xij and makes a call of the procedure Partition_and_Sort(Xij), 
which returns the bipartition {X^ , X^ } of Xij and a total order on the cells of X^ and X^ . 
Then Refine concatenates in a single total order on cells the total orders on cells coming 
from different sets assigned to the same hole. After this. Refine is recursively applied to each 
cell occurring in some graph Gij. The returned total orders on cells are concatenated into a 
single total order ^ on X according to the total orders between cells and between holes; then 
-< is returned by the algorithm /oo-Fitting_by_Robinson. The procedure Partition_and_Sort 
constructs the graphs Cij and Using these graphs, Xij is partitioned into blocks and 
cells, then graph Sij and its clusters are constructed. Using the cells, the directed graph Qij 
is constructed. If Sij is not bipartite or Qij contains (G3)-cycles, then Partition_and_Sort 
returns the answer "not". Otherwise, for each cell C and each cluster /C, it tests if there 
exists a 1-cycle and/or a 2-cycle passing via C and intersecting IC. Consequently, for each 
cell C, the lists Qi{C) , il.2{C) , and ^^^{C) of (G2)-arcs are computed. These lists are used 
to construct the 2-SAT formula ^ij, which is solved by the algorithm of [2]. If $jj admits 
a true assignment A, then X~j = {C : A{xc) = 0} and X^ = {C : A{xc) = 1} define a 
bipartition of Xij into two acyclic subgraphs Q^j, Qf- of Qij. Then Partition_and_Sort locates 
the cells from Xf- in the hole -ffj-i according to the topological order of the acyclic graph 
Qfj and it locates the cells from X~- in the hole Hi according to the dual topological order 
of Q^'-. Note that if at some stage Refine or Partition_and_Sort returns the answer "not", 
then there does not exists any e-compatible total order on X and the current value of e 
is too small. The total complexity of the algorithm is 0(n^ log n). We formulate now the 
main result of our paper: 

Theorem 5.1. Fore G A, if the algorithm returns the answer "not", then the dissimilarity 
d is not e-Robinson, else, it returns a 16e- compatible total order -< on X. In particular, the 
algorithm is a factor 16 approximation algorithm for /qo -FITTING-BY-ROBINSON. 

Proof. First, note that no e-compatible order exist in all cases when the algorithm returns 
the answer "not". Indeed, Lemma [4.41 Propositions 3.2 and 4.10 cover all such cases except 
the case when this answer is returned by a recursive call. In this case, the induction 
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assumption implies that no e-compatible total order on C extending (and therefore its 
dual exist. Then we infer that no e-compatible order on X exist as well. 

Now, let the algorithm return a total order -< . Suppose by induction assumption that -< 
is 16e-compatible on each cell to which a recursive call is applied. On the chain P, the total 
order -< coincides with ^, therefore -< is e-compatible on P. Moreover, -< is e-compatible 
on P U {j;} for any x E X° , because every element x is located in a bounding hole of 
H{x) which is x-admissible. Finally notice that -< is 12e-compatible on P U {x, y} for any 
x,y G X° because by Proposition 13.21 the bounding hole of H{x) and the bounding hole of 
H{y) into which x and y are located define a {x, y, 12)-admissible pair. To prove that -< is 
16e-compatible on the whole set X, it suffices to show that d{x, z) >i6 max{(i(x, y), d{y, z)} 
for any three elements x,y,z S X such that x ^ y < z. From previous discussion, we can 
suppose that x,y,z € X°. For this, we distinguish the Cases (H1)-(H5) in function of 
the mutual location of segments H{x) and H(z) and in each case we show the required 
inequality. The respective case analysis is given in [9]. 

■ 
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